A CENTRAL LIMIT THEOREM FOR REPEATING PATTERNS 



Abstract. This note gives a central limit theorem for the length of the longest subsequence of 
a random permutation which follows some repeating pattern. This includes the case of any fixed 
pattern of ups and downs which has at least one of each, such as the alternating case considered in 
[2] and [3]. In every case considered the convergence in the limit of long permutations is to normal 
with mean and variance linear in the length of the permutations. 



1. Setup 

An r-pattern of length k is a map w from Z/fcZ to 2 Sr — {0} where S r is a symmetric group. 
The case of up and down sequences is that of 2-patterns with every io([a]) either {up} = {(12)} 
or {down} = {(21)}. A sequence a G R n follows an r-pattern w if for every k the values a(k + 
1), . . . ,a(k + r) are distinct and the associated permutation p is in itf([A;]) where p is defined by 
having p(i) < p(j) if a{k + i) < a(k + j). Consider the uniform probability measure on each 
symmetric group S n and regard S n as a subset of [l,n] n C R n . The main random variables of 
interest are mapping S n to Z + with 

L™{a) = max{r|3r : [l,r] — > [l,n] with every r(i + 1) > r(i) and got following w} 

the length of the longest subsequence following w. A sequence {f n } of R valued random variables 
satisfies a clt if there are \i € R and < a G R With 



lim Probf f n - pn < t<jy/n) = 



2 

for every t G R. Here 3>(i) = -J= f ^ e~~du is the cumulative distribution function for the 
standard normal distribution. 

The fact that every nonconstant up and down pattern w has {L™} following a clt will follow as 
a corollary of the theorem below. The proof does not give good bounds on the means or variances 
involved. This is in contrast to the special case of alternating subsequences, where p = | and 
a 2 = | are straightforward to compute. 

Write err = (<r(l), . . . , cr(a), r(l), . . . , r(b)) for the product and a m = a ... a for the rath power 
if a G R a and r G R b are sequences of lengths a and b. The proof will also require the patterns to 
be combinatorial, though it is not clear whether this is the best condition possible. In particular 
the constant up and down patterns do not follow a clt so some condition is needed. 

Call a pattern w combinatorial if every pair of sequences of lengths a and b following w have a 
subsequence of their product of length at least a + b — k which also follows w, where k is the length 
of w. 

2. Theorem 

Theorem. If w is a combinatorial pattern then {L™} satisfy a clt. 

The following is easy: 
Corollary. Every nonconstant 2-pattern is combinatorial and hence follows a clt. 
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Note that there are also other nontrivial examples. For instance the constant 3-pattern of length 
3 with every w([i]) = {(123), (231), (312)} is combinatorial. 



3. Proof 

The idea is to switch from uniformly chosen permutations to sequences of independently selected 
points in an interval. In this setting one can choose a positive probability event depending on only 
a short subsequence of points so that the longest subsequence following the given pattern is found 
by combining a longest one before the event and a longest one after it. The problem is thus reduced 
to a sum of independent events with good enough control on the number of these events. 

Consider the probability space Q = [0, l] z + with the product Lebesgue measure. Write tta '■ ^ — > 
[0, 1] A for the projections and R™ a b j for the random variable with 

iff Jff) = max{|^4| : A C [a, b] and tta((t) follows w} 

the length of the longest subsequence in the interval following w and R™ = R^ n j for the family of 
random variables of interest. Note that the push forward of the uniform measure under and 
that of the Lebesgue measure under R™ are the same measure on Z + and hence one family satisfies 
a clt iff the other one does. 

Fix a combinatorial r-pattern w of length k. 

In the notation of [1], if w drifts up then taking a to be a long sequence following w with values 
above \ and b one with values below \ would show that w is not combinatorial so w is driftless. By 
Proposition 4.10 of [1] this gives a totally driftless loop which in turn gives a permutation r G S k 
so that every power of r follows w. 

The decomposing event is the subcube B of [0, l] 4fe with volume vol(-B) = (3k)~ 4k given by 

B =(n(S + ^ + » 

\ie[l,k] 

Note that every element of B follows w. 

Here are some random variables which decompose the sequences at the B events. Take {dj} for 
j > 1 to be the iid indicator random variables for B so that 




d ./ x = f 1 if K {Akj _ AkAkj] (a) G B 
jV | otherwise 



with means = vol(B) and variances a\ = vo\(B){l — vol(-B)). For the partial sums write 
Da = J2jeA^j an d D n = D^ n y Next build approximate inverse random variables to D n using for 
j > the iid random variables {pj} with 

Pj(cr) = min{aj|there are < ao < . . . < aj with every d a% = 1} 

for the position of the (j + l)st occurrence of an event in B and for j > 1 the iid variables {qj} 
with 

Qj=Pj-Pj-i 

for the number of steps between events in B. Here a sequence of 4/cn elements in [0, 1] has n positions 
and adjacent positions differ by a single step. Note that these are defined almost everywhere in Q. 
Again write Qa and Q n for the associated partial sums. Thus fi q = [i^ 1 = (3fc) 4fc , Qd„ is the first 
position of an event in B after position n and D Pn = n + 1. 
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Here are some random variables involving those above and lengths of subsequences following w. 
For j > 1 take the iid random variables {sj} to be 

b 3 — - a (4fcp j _i+2fc,4A:p j +2fc] 

for the length of the longest subsequences between the midpoints of successive events in B which 
follow w. Take {tj} to be 

tj = sj - n s ndqj 

another collection of iid random variables. Note that since 4/c < sj < 4kqj the means fi s are 
finite and hence the tj are also defined almost everywhere and have means fit = 0. Since there 
are many different positive probability values for the tj their variances of are nonzero and since 
—HsfidQj < tj < s j they are also finite. 

Note that since w is combinatorial and every event in B follows w, every longest subsequence 
of a following w includes every interval (4kpj + k,4kpj + 3k], which is the middle 2k indices of an 
event in B and similarly every longest subsequence of the projection ft(4 : kpi+2k,4kpj+2k]( a ) following 
w includes the end intervals (4kpi + 2k, Akpi + 3k] and [Akpj + k, Akpj + 2k]. This implies 
Lemma 1. R™ 4kpi+2kAkpDn+2k] = S Dn . 

This sum is further decomposed into (2)+(3)+(4) for easier analysis and the short ends (1) and 
(5) are added to get 

Lemma 2. R% kn = (1) + (2) + (3) + (4) + (5) where 

(1) = R4 kp()+ 2ki 
( 2 ) = T l^n\ » 




-T(D n ,[n d n}} otherwise 



(4) = /J,sVdQD n 

and 

(5) = —R™4kn,4kp Dn +2k}- 

For the desired large n limit the terms (1), (3) and (5) are too short to contribute, though for (3) 
this uses Kolmogorov's inequality. The mean comes from (4), which has vanishing variance and 
the variance comes from (2) with vanishing mean. 

(1),(5): Since |(1)| < Akpo which is independent of n there is 

lim Probf 1(1)1 > aV^kn) = 

n->oo V / 

for every a > and similarly for (5). 

(3): Assume the first case in the definition of (3) holds. The second case is similar. The number 
of tj terms in the sum is | D n — [/J,d,n\ \ and since D n is a sum n of the dj iid random variables the 
central limit theorem gives 

lim Prob( \D n — |_^d^J | > vadVn) = $>(— v) 
and Kolmogorov's inequality gives gives for every a and b that 

Prob(max j<a |r (L/idnJ)L/ldnJ+j] | > ft) < . 
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Taking a = n2 +e and b = n^~ € for any < e < g gives 

Prob (| (3)| > ^) < a?n 3e -^ +Prob(jA 1 - [fi d n] \ > n e ^) 



and hence 



lim Prob (1(3)1 > — =0 + lim $( ) = 0. 

■' - v n e ) n.-s>oo (j d 



(4): Since Qd„ is the position of first occurrence after n of an event in B, Prob( Qd u ~ n > a 
(1 - Hd)^ • Thus for any u > 



n— >oo 



lim Prob( |(4) — /i s //^n| > U\/n) = lim Prob(Q£>„ — n > ) = 0. 



(2) : Since the second term is a sum of L/^d^J iid mean zero variables tj , the central limit theorem 
gives for every u that 

lim Prob( (2) < uo t ^ii d n) = $(u). 



n— >oo 

Adding these gives for every u the desired 

lim Probf R™ kn - ix s [i d n < uatJ^Td^/n) =$(u). 

4. Further Directions 

The mean length for any pattern is roughly controlled by the drift in that patterns with drift 
have a mean length of order y/n by a comparison to the increasing case, while those without have 
mean linear in n. It is not clear whether the combinatorial property is needed for a central limit 
theorem. 
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